Skip to content

[WIP] Support qwen3-omni#4411

Draft
CUHKSZzxy wants to merge 22 commits intoInternLM:mainfrom
CUHKSZzxy:support-qwen3-omni
Draft

[WIP] Support qwen3-omni#4411
CUHKSZzxy wants to merge 22 commits intoInternLM:mainfrom
CUHKSZzxy:support-qwen3-omni

Conversation

@CUHKSZzxy
Copy link
Copy Markdown
Collaborator

@CUHKSZzxy CUHKSZzxy commented Mar 13, 2026

Summary

Support Qwen3-Omni thinker inference in the PyTorch backend.

This PR adds Qwen3-Omni model registration, HF processor integration, and multimodal preprocessing for image, video, audio, and mixed image/audio/video inputs. Audio support is currently limited to Qwen3-Omni.

Changes

  • Add Qwen3-Omni PyTorch thinker model support.
  • Add Qwen3-Omni VL preprocessor using the shared get_input_prompt -> preprocess path.
  • Support image-only, video-only, audio-only, and mixed image/audio/video prompts.
  • Keep Qwen3-Omni video expansion as whole-video spans, distinct from Qwen3VL per-frame timestamp handling.
  • Add audio media parsing for OpenAI-style multimodal messages.
  • Add multimodal input docs and examples, including Qwen3-Omni audio usage.

Notes

  • Talker/audio-generation support is not included.
  • Audio input support is scoped to Qwen3-Omni.
  • Advanced use_audio_in_video=True interleaving is not enabled in this patch.

Related

Prerequisite PR

@CUHKSZzxy CUHKSZzxy force-pushed the support-qwen3-omni branch from 8d64a7a to 4c6bc99 Compare March 19, 2026 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant